Overview

Dataset statistics

Number of variables28
Number of observations191563
Missing cells708906
Missing cells (%)13.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory37.1 MiB
Average record size in memory203.0 B

Variable types

CAT15
NUM9
BOOL4

Warnings

account_creation_date has a high cardinality: 190383 distinct values High cardinality
trial_end_date has a high cardinality: 190383 distinct values High cardinality
last_payment has a high cardinality: 106702 distinct values High cardinality
next_payment has a high cardinality: 104772 distinct values High cardinality
cancel_date has a high cardinality: 284 distinct values High cardinality
discount_price is highly correlated with monthly_price and 1 other fieldsHigh correlation
monthly_price is highly correlated with discount_price and 1 other fieldsHigh correlation
num_trial_days is highly correlated with monthly_price and 1 other fieldsHigh correlation
num_trial_days is highly correlated with plan_typeHigh correlation
plan_type is highly correlated with num_trial_daysHigh correlation
package_type has 35361 (18.5%) missing values Missing
num_weekly_services_utilized has 74802 (39.0%) missing values Missing
preferred_genre has 36107 (18.8%) missing values Missing
intended_use has 3502 (1.8%) missing values Missing
weekly_consumption_hour has 2755 (1.4%) missing values Missing
num_ideal_streaming_services has 76518 (39.9%) missing values Missing
attribution_survey has 2602 (1.4%) missing values Missing
op_sys has 12986 (6.8%) missing values Missing
payment_type has 134856 (70.4%) missing values Missing
last_payment has 84275 (44.0%) missing values Missing
next_payment has 86098 (44.9%) missing values Missing
cancel_date has 159040 (83.0%) missing values Missing
monthly_price is highly skewed (γ1 = -32.62319544) Skewed
discount_price is highly skewed (γ1 = -31.4084658) Skewed
account_creation_date is uniformly distributed Uniform
trial_end_date is uniformly distributed Uniform
last_payment is uniformly distributed Uniform
next_payment is uniformly distributed Uniform
df_index has unique values Unique
subid has unique values Unique
join_fee has 33089 (17.3%) zeros Zeros

Reproduction

Analysis started2020-12-12 13:51:22.668997
Analysis finished2020-12-12 13:53:04.349768
Duration1 minute and 41.68 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct191563
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean113752.0782
Minimum1
Maximum227627
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum1
5-th percentile11325.1
Q156788.5
median113852
Q3170635.5
95-th percentile216214.9
Maximum227627
Range227626
Interquartile range (IQR)113847

Descriptive statistics

Standard deviation65738.24676
Coefficient of variation (CV)0.5779080943
Kurtosis-1.200385816
Mean113752.0782
Median Absolute Deviation (MAD)56920
Skewness-0.0005193121325
Sum2.179068936e+10
Variance4321517087
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
40941< 0.1%
 
751501< 0.1%
 
1058811< 0.1%
 
1038321< 0.1%
 
1263591< 0.1%
 
1284041< 0.1%
 
1181631< 0.1%
 
1161141< 0.1%
 
1222571< 0.1%
 
1202081< 0.1%
 
Other values (191553)191553> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
2276271< 0.1%
 
2276251< 0.1%
 
2276241< 0.1%
 
2276231< 0.1%
 
2276221< 0.1%
 

subid
Real number (ℝ≥0)

UNIQUE

Distinct191563
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24986379
Minimum20000009
Maximum29999982
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum20000009
5-th percentile20499195.7
Q122490128.5
median24971020
Q327487834.5
95-th percentile29491790.3
Maximum29999982
Range9999973
Interquartile range (IQR)4997706

Descriptive statistics

Standard deviation2883893.76
Coefficient of variation (CV)0.1154186351
Kurtosis-1.198724491
Mean24986379
Median Absolute Deviation (MAD)2498988
Skewness0.008486716085
Sum4.78646572e+12
Variance8.316843222e+12
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
246435831< 0.1%
 
263221921< 0.1%
 
234428161< 0.1%
 
202172531< 0.1%
 
205501691< 0.1%
 
243992671< 0.1%
 
233486421< 0.1%
 
207385461< 0.1%
 
202070081< 0.1%
 
222121721< 0.1%
 
Other values (191553)191553> 99.9%
 
ValueCountFrequency (%) 
200000091< 0.1%
 
200000481< 0.1%
 
200000621< 0.1%
 
200000941< 0.1%
 
200001041< 0.1%
 
ValueCountFrequency (%) 
299999821< 0.1%
 
299999451< 0.1%
 
299999041< 0.1%
 
299998791< 0.1%
 
299998621< 0.1%
 

package_type
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing35361
Missing (%)18.5%
Memory size1.5 MiB
base
86886 
enhanced
53373 
economy
15943 
ValueCountFrequency (%) 
base8688645.4%
 
enhanced5337327.9%
 
economy159438.3%
 
(Missing)3536118.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length8
Median length4
Mean length5.179559727
Min length3

num_weekly_services_utilized
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing74802
Missing (%)39.0%
Infinite0
Infinite (%)0.0%
Mean3.008984164
Minimum0
Maximum14
Zeros2
Zeros (%)< 0.1%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile2
Q12
median3
Q33
95-th percentile5
Maximum14
Range14
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8208150659
Coefficient of variation (CV)0.2727880976
Kurtosis2.353599881
Mean3.008984164
Median Absolute Deviation (MAD)0
Skewness1.023508238
Sum351332
Variance0.6737373725
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
36370133.3%
 
22967015.5%
 
4172699.0%
 
550162.6%
 
69040.5%
 
71380.1%
 
124< 0.1%
 
822< 0.1%
 
911< 0.1%
 
103< 0.1%
 
Other values (2)3< 0.1%
 
(Missing)7480239.0%
 
ValueCountFrequency (%) 
02< 0.1%
 
124< 0.1%
 
22967015.5%
 
36370133.3%
 
4172699.0%
 
ValueCountFrequency (%) 
141< 0.1%
 
103< 0.1%
 
911< 0.1%
 
822< 0.1%
 
71380.1%
 

preferred_genre
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing36107
Missing (%)18.8%
Memory size1.5 MiB
comedy
97258 
drama
39939 
regional
 
8418
international
 
6063
other
 
3778
ValueCountFrequency (%) 
comedy9725850.8%
 
drama3993920.8%
 
regional84184.4%
 
international60633.2%
 
other37782.0%
 
(Missing)3610718.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length13
Median length6
Mean length5.515767659
Min length3

intended_use
Categorical

MISSING

Distinct7
Distinct (%)< 0.1%
Missing3502
Missing (%)1.8%
Memory size1.5 MiB
access to exclusive content
69727 
replace OTT
56278 
supplement OTT
23233 
expand regional access
13941 
expand international access
12929 
Other values (2)
11953 
ValueCountFrequency (%) 
access to exclusive content6972736.4%
 
replace OTT5627829.4%
 
supplement OTT2323312.1%
 
expand regional access139417.3%
 
expand international access129296.7%
 
other68493.6%
 
education51042.7%
 
(Missing)35021.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length27
Median length22
Mean length18.65402505
Min length3

weekly_consumption_hour
Real number (ℝ)

MISSING

Distinct81
Distinct (%)< 0.1%
Missing2755
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean28.00112139
Minimum-32.1467596
Maximum76.59996225
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum-32.1467596
5-th percentile21.50162318
Q124.40153577
median27.30144835
Q330.20136093
95-th percentile37.45114239
Maximum76.59996225
Range108.7467219
Interquartile range (IQR)5.799825165

Descriptive statistics

Standard deviation4.977112143
Coefficient of variation (CV)0.1777468864
Kurtosis3.173344778
Mean28.00112139
Median Absolute Deviation (MAD)2.899912583
Skewness0.6404438306
Sum5286835.728
Variance24.77164528
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
25.851492062586313.5%
 
28.751404642284811.9%
 
27.301448352243611.7%
 
24.401535771986910.4%
 
30.20136093182989.6%
 
22.95157947167328.7%
 
31.65131722131996.9%
 
33.1012735193784.9%
 
21.5016231886654.5%
 
34.551229873983.9%
 
Other values (71)2412212.6%
 
ValueCountFrequency (%) 
-32.14675961< 0.1%
 
-29.246847014< 0.1%
 
-27.796890721< 0.1%
 
-23.447021851< 0.1%
 
-13.297327811< 0.1%
 
ValueCountFrequency (%) 
76.599962253< 0.1%
 
75.150005963< 0.1%
 
73.700049671< 0.1%
 
72.250093385< 0.1%
 
67.900224515< 0.1%
 

num_ideal_streaming_services
Real number (ℝ)

MISSING

Distinct8
Distinct (%)< 0.1%
Missing76518
Missing (%)39.9%
Infinite0
Infinite (%)0.0%
Mean2.061262984
Minimum-1
Maximum7
Zeros4
Zeros (%)< 0.1%
Memory size1.5 MiB

Quantile statistics

Minimum-1
5-th percentile2
Q12
median2
Q32
95-th percentile3
Maximum7
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2459331339
Coefficient of variation (CV)0.1193118665
Kurtosis14.18847432
Mean2.061262984
Median Absolute Deviation (MAD)0
Skewness3.601336973
Sum237138
Variance0.06048310636
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%) 
210785756.3%
 
370473.7%
 
189< 0.1%
 
442< 0.1%
 
54< 0.1%
 
04< 0.1%
 
71< 0.1%
 
-11< 0.1%
 
(Missing)7651839.9%
 
ValueCountFrequency (%) 
-11< 0.1%
 
04< 0.1%
 
189< 0.1%
 
210785756.3%
 
370473.7%
 
ValueCountFrequency (%) 
71< 0.1%
 
54< 0.1%
 
442< 0.1%
 
370473.7%
 
210785756.3%
 

age
Real number (ℝ≥0)

Distinct95
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.15751248
Minimum0
Maximum90
Zeros64
Zeros (%)< 0.1%
Memory size1.5 MiB

Quantile statistics

Minimum0
5-th percentile24
Q135
median46
Q357
95-th percentile70
Maximum90
Range90
Interquartile range (IQR)22

Descriptive statistics

Standard deviation13.97657738
Coefficient of variation (CV)0.3028017895
Kurtosis-0.7493570161
Mean46.15751248
Median Absolute Deviation (MAD)11
Skewness0.08003035382
Sum8842071.563
Variance195.3447151
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
5063553.3%
 
4055092.9%
 
6052212.7%
 
4846512.4%
 
4745172.4%
 
5545092.4%
 
4944832.3%
 
5244402.3%
 
4343572.3%
 
4243462.3%
 
Other values (85)14317574.7%
 
ValueCountFrequency (%) 
064< 0.1%
 
103< 0.1%
 
161< 0.1%
 
1810970.6%
 
196320.3%
 
ValueCountFrequency (%) 
902< 0.1%
 
897< 0.1%
 
888< 0.1%
 
8713< 0.1%
 
8614< 0.1%
 

male_TF
Boolean

Distinct2
Distinct (%)< 0.1%
Missing4
Missing (%)< 0.1%
Memory size1.5 MiB
False
166771 
True
24788 
(Missing)
 
4
ValueCountFrequency (%) 
False16677187.1%
 
True2478812.9%
 
(Missing)4< 0.1%
 
Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
facebook
62809 
search
22080 
email
21109 
organic
18683 
brand sem intent google
15821 
Other values (28)
51061 
ValueCountFrequency (%) 
facebook6280932.8%
 
search2208011.5%
 
email2110911.0%
 
organic186839.8%
 
brand sem intent google158218.3%
 
google_organic95535.0%
 
affiliate93184.9%
 
email_blast63213.3%
 
pinterest56342.9%
 
referral40052.1%
 
Other values (23)162308.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length23
Median length8
Mean length9.270464547
Min length2

attribution_survey
Categorical

MISSING

Distinct16
Distinct (%)< 0.1%
Missing2602
Missing (%)1.4%
Memory size1.5 MiB
facebook
93584 
tv
34737 
referral
17102 
search
 
8084
pinterest
 
7393
Other values (11)
28061 
ValueCountFrequency (%) 
facebook9358448.9%
 
tv3473718.1%
 
referral171028.9%
 
search80844.2%
 
pinterest73933.9%
 
other64233.4%
 
public_radio58973.1%
 
social_organic37351.9%
 
youtube30861.6%
 
podcast29211.5%
 
Other values (6)59993.1%
 
(Missing)26021.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length16
Median length8
Mean length6.926488936
Min length2

op_sys
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing12986
Missing (%)6.8%
Memory size1.5 MiB
iOS
116419 
Android
62158 
ValueCountFrequency (%) 
iOS11641960.8%
 
Android6215832.4%
 
(Missing)129866.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length7
Median length3
Mean length4.297912436
Min length3

plan_type
Categorical

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
base_uae_14_day_trial
191032 
high_uae_14_day_trial
 
325
low_uae_no_trial
 
166
base_eur_14_day_trial
 
18
high_sar_14_day_trial
 
12
Other values (6)
 
10
ValueCountFrequency (%) 
base_uae_14_day_trial19103299.7%
 
high_uae_14_day_trial3250.2%
 
low_uae_no_trial1660.1%
 
base_eur_14_day_trial18< 0.1%
 
high_sar_14_day_trial12< 0.1%
 
low_gbp_14_day_trial4< 0.1%
 
high_aud_14_day_trial2< 0.1%
 
high_jpy_14_day_trial1< 0.1%
 
base_uae_no_trial_7_day_guarantee1< 0.1%
 
low_eur_no_trial1< 0.1%
 
Frequencies of value counts

Unique

Unique4 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length33
Median length21
Mean length20.99565678
Min length16

monthly_price
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.731644867
Minimum0.8074
Maximum5.1013
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum0.8074
5-th percentile4.7343
Q14.7343
median4.7343
Q34.7343
95-th percentile4.7343
Maximum5.1013
Range4.2939
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1098266661
Coefficient of variation (CV)0.0232110966
Kurtosis1088.933009
Mean4.731644867
Median Absolute Deviation (MAD)0
Skewness-32.62319544
Sum906408.0856
Variance0.01206189658
MonotocityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
4.734319103799.7%
 
5.10133250.2%
 
1.06431660.1%
 
4.440718< 0.1%
 
4.367312< 0.1%
 
4.00032< 0.1%
 
1.17441< 0.1%
 
0.80741< 0.1%
 
4.69761< 0.1%
 
ValueCountFrequency (%) 
0.80741< 0.1%
 
1.06431660.1%
 
1.17441< 0.1%
 
4.00032< 0.1%
 
4.367312< 0.1%
 
ValueCountFrequency (%) 
5.10133250.2%
 
4.734319103799.7%
 
4.69761< 0.1%
 
4.440718< 0.1%
 
4.367312< 0.1%
 

discount_price
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.511846805
Minimum0.7707
Maximum5.0279
Zeros0
Zeros (%)0.0%
Memory size1.5 MiB

Quantile statistics

Minimum0.7707
5-th percentile4.5141
Q14.5141
median4.5141
Q34.5141
95-th percentile4.5141
Maximum5.0279
Range4.2572
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.105518673
Coefficient of variation (CV)0.02338702478
Kurtosis1041.714144
Mean4.511846805
Median Absolute Deviation (MAD)0
Skewness-31.4084658
Sum864302.9096
Variance0.01113419036
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4.514119103299.7%
 
5.02793250.2%
 
1.02761660.1%
 
4.220518< 0.1%
 
4.073712< 0.1%
 
4.36734< 0.1%
 
4.44072< 0.1%
 
3.78012< 0.1%
 
1.17441< 0.1%
 
0.77071< 0.1%
 
ValueCountFrequency (%) 
0.77071< 0.1%
 
1.02761660.1%
 
1.17441< 0.1%
 
3.78012< 0.1%
 
4.073712< 0.1%
 
ValueCountFrequency (%) 
5.02793250.2%
 
4.514119103299.7%
 
4.44072< 0.1%
 
4.36734< 0.1%
 
4.220518< 0.1%
 

account_creation_date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct190383
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2019-07-02 14:58:45
 
3
2019-07-01 01:21:10
 
3
2020-02-29 19:26:26
 
3
2019-12-28 16:35:34
 
3
2020-02-29 17:58:30
 
3
Other values (190378)
191548 
ValueCountFrequency (%) 
2019-07-02 14:58:453< 0.1%
 
2019-07-01 01:21:103< 0.1%
 
2020-02-29 19:26:263< 0.1%
 
2019-12-28 16:35:343< 0.1%
 
2020-02-29 17:58:303< 0.1%
 
2019-06-30 14:47:573< 0.1%
 
2019-10-26 12:15:452< 0.1%
 
2019-08-15 18:32:542< 0.1%
 
2019-07-04 17:04:472< 0.1%
 
2020-01-06 13:05:152< 0.1%
 
Other values (190373)191537> 99.9%
 
Frequencies of value counts

Unique

Unique189209 ?
Unique (%)98.8%
Histogram of lengths of the category

Length

Max length19
Median length19
Mean length19
Min length19

trial_end_date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct190383
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2019-07-14 14:47:57
 
3
2020-01-11 16:35:34
 
3
2020-03-14 19:26:26
 
3
2019-07-16 14:58:45
 
3
2019-07-15 01:21:10
 
3
Other values (190378)
191548 
ValueCountFrequency (%) 
2019-07-14 14:47:573< 0.1%
 
2020-01-11 16:35:343< 0.1%
 
2020-03-14 19:26:263< 0.1%
 
2019-07-16 14:58:453< 0.1%
 
2019-07-15 01:21:103< 0.1%
 
2020-03-14 17:58:303< 0.1%
 
2019-07-29 22:13:182< 0.1%
 
2019-07-18 18:39:312< 0.1%
 
2019-11-20 03:40:212< 0.1%
 
2020-03-12 15:11:112< 0.1%
 
Other values (190373)191537> 99.9%
 
Frequencies of value counts

Unique

Unique189209 ?
Unique (%)98.8%
Histogram of lengths of the category

Length

Max length19
Median length19
Mean length19
Min length19
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size187.1 KiB
False
180451 
True
 
11112
ValueCountFrequency (%) 
False18045194.2%
 
True111125.8%
 

join_fee
Real number (ℝ)

ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1152957006
Minimum-0.6606
Maximum0.734
Zeros33089
Zeros (%)17.3%
Memory size1.5 MiB

Quantile statistics

Minimum-0.6606
5-th percentile0
Q10.0367
median0.0367
Q30.1101
95-th percentile0.6606
Maximum0.734
Range1.3946
Interquartile range (IQR)0.0734

Descriptive statistics

Standard deviation0.1770207005
Coefficient of variation (CV)1.535362546
Kurtosis3.123783389
Mean0.1152957006
Median Absolute Deviation (MAD)0
Skewness2.024318696
Sum22086.3903
Variance0.03133632839
MonotocityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%) 
0.036710892756.9%
 
03308917.3%
 
0.33032520013.2%
 
0.6606122196.4%
 
0.1101106175.5%
 
0.36713280.7%
 
0.18351460.1%
 
-0.03679< 0.1%
 
0.69738< 0.1%
 
0.62396< 0.1%
 
Other values (10)14< 0.1%
 
ValueCountFrequency (%) 
-0.66062< 0.1%
 
-0.33031< 0.1%
 
-0.11011< 0.1%
 
-0.03679< 0.1%
 
03308917.3%
 
ValueCountFrequency (%) 
0.7341< 0.1%
 
0.69738< 0.1%
 
0.6606122196.4%
 
0.62396< 0.1%
 
0.58721< 0.1%
 

payment_type
Categorical

MISSING

Distinct6
Distinct (%)< 0.1%
Missing134856
Missing (%)70.4%
Memory size1.5 MiB
Standard Charter
22307 
Paypal
17910 
RAKBANK
10177 
CBD
4194 
Najim
 
2115
ValueCountFrequency (%) 
Standard Charter2230711.6%
 
Paypal179109.3%
 
RAKBANK101775.3%
 
CBD41942.2%
 
Najim21151.1%
 
Apple Pay4< 0.1%
 
(Missing)13485670.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length16
Median length3
Mean length5.029008733
Min length3

num_trial_days
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
14
191394 
0
 
169
ValueCountFrequency (%) 
1419139499.9%
 
01690.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length1.999117784
Min length1
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size187.1 KiB
True
105465 
False
86098 
ValueCountFrequency (%) 
True10546555.1%
 
False8609844.9%
 

payment_period
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
84275 
1
70204 
2
34773 
3
 
2311
ValueCountFrequency (%) 
08427544.0%
 
17020436.6%
 
23477318.2%
 
323111.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

last_payment
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct106702
Distinct (%)99.5%
Missing84275
Missing (%)44.0%
Memory size1.5 MiB
2020-03-08 19:00:43
 
3
2020-02-29 12:16:18
 
3
2020-03-14 19:26:26
 
3
2020-03-15 18:08:08
 
2
2020-03-15 12:43:15
 
2
Other values (106697)
107275 
ValueCountFrequency (%) 
2020-03-08 19:00:433< 0.1%
 
2020-02-29 12:16:183< 0.1%
 
2020-03-14 19:26:263< 0.1%
 
2020-03-15 18:08:082< 0.1%
 
2020-03-15 12:43:152< 0.1%
 
2020-02-29 14:35:442< 0.1%
 
2020-03-26 16:03:322< 0.1%
 
2020-03-02 18:23:292< 0.1%
 
2020-03-22 12:15:252< 0.1%
 
2020-02-15 21:29:542< 0.1%
 
Other values (106692)10726556.0%
 
(Missing)8427544.0%
 
Frequencies of value counts

Unique

Unique106119 ?
Unique (%)98.9%
Histogram of lengths of the category

Length

Max length19
Median length19
Mean length11.96106242
Min length3

next_payment
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct104772
Distinct (%)99.3%
Missing86098
Missing (%)44.9%
Memory size1.5 MiB
2020-06-29 12:16:18
 
3
2020-07-08 19:00:43
 
3
2020-07-14 19:26:26
 
3
2020-07-25 15:36:42
 
2
2020-05-24 12:32:38
 
2
Other values (104767)
105452 
ValueCountFrequency (%) 
2020-06-29 12:16:183< 0.1%
 
2020-07-08 19:00:433< 0.1%
 
2020-07-14 19:26:263< 0.1%
 
2020-07-25 15:36:422< 0.1%
 
2020-05-24 12:32:382< 0.1%
 
2020-04-30 15:23:342< 0.1%
 
2020-04-21 02:03:472< 0.1%
 
2020-04-15 02:54:002< 0.1%
 
2020-03-30 14:39:342< 0.1%
 
2020-06-12 02:52:492< 0.1%
 
Other values (104762)10544255.0%
 
(Missing)8609844.9%
 
Frequencies of value counts

Unique

Unique104082 ?
Unique (%)98.7%
Histogram of lengths of the category

Length

Max length19
Median length19
Mean length11.80879919
Min length3

cancel_date
Categorical

HIGH CARDINALITY
MISSING

Distinct284
Distinct (%)0.9%
Missing159040
Missing (%)83.0%
Memory size1.5 MiB
2019-07-13 00:00:00
 
430
2019-07-12 00:00:00
 
391
2019-07-14 00:00:00
 
382
2019-07-15 00:00:00
 
346
2019-07-11 00:00:00
 
321
Other values (279)
30653 
ValueCountFrequency (%) 
2019-07-13 00:00:004300.2%
 
2019-07-12 00:00:003910.2%
 
2019-07-14 00:00:003820.2%
 
2019-07-15 00:00:003460.2%
 
2019-07-11 00:00:003210.2%
 
2019-07-16 00:00:003130.2%
 
2019-07-17 00:00:003000.2%
 
2019-07-10 00:00:002860.1%
 
2019-07-09 00:00:002570.1%
 
2019-07-18 00:00:002540.1%
 
Other values (274)2924315.3%
 
(Missing)15904083.0%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length19
Median length3
Mean length5.716432714
Min length3
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size187.1 KiB
True
167235 
False
24328 
ValueCountFrequency (%) 
True16723587.3%
 
False2432812.7%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexsubidpackage_typenum_weekly_services_utilizedpreferred_genreintended_useweekly_consumption_hournum_ideal_streaming_servicesagemale_TFattribution_technicalattribution_surveyop_sysplan_typemonthly_pricediscount_priceaccount_creation_datetrial_end_dateinitial_credit_card_declinedjoin_feepayment_typenum_trial_dayscurrent_sub_TFpayment_periodlast_paymentnext_paymentcancel_datetrial_completed
0123383224baseNaNcomedyaccess to exclusive content22.951579NaN70.0FalsefacebookfacebookNaNbase_uae_14_day_trial4.73434.51412020-03-01 15:44:352020-03-15 15:44:35False0.3303NaN14True12020-03-15 15:44:352020-07-15 15:44:35NaNTrue
1226844789enhanced3.0regionalreplace OTT36.0011862.025.0TrueorganicfacebookiOSbase_uae_14_day_trial4.73434.51412019-12-07 16:37:062019-12-21 16:37:06False0.1101NaN14False0NaNNaNNaNTrue
2329417030baseNaNdramareplace OTT20.051667NaN30.0FalsesearchtvAndroidbase_uae_14_day_trial4.73434.51412020-01-27 16:09:322020-02-10 16:09:32False0.0367NaN14False0NaNNaNNaNTrue
3426723159base4.0comedyreplace OTT22.9515793.028.0FalsediscoveryyoutubeiOSbase_uae_14_day_trial4.73434.51412019-10-05 12:57:072019-10-19 12:57:07False0.0367NaN14True22020-02-19 12:57:072020-06-19 12:57:07NaNTrue
4524810928baseNaNcomedyaccess to exclusive content20.051667NaN70.0FalsebingtvNaNbase_uae_14_day_trial4.73434.51412020-03-03 20:15:432020-03-17 20:15:43False0.3303RAKBANK14True12020-03-17 20:15:432020-07-17 20:15:43NaNTrue
5629726122base2.0comedyaccess to exclusive content20.0516672.061.0FalsebingsearchAndroidbase_uae_14_day_trial4.73434.51412020-02-19 18:30:152020-03-04 18:30:15False0.3303Standard Charter14True12020-03-04 18:30:152020-07-04 18:30:15NaNTrue
6720299962base3.0dramaaccess to exclusive content34.5512302.023.0FalseemailreferraliOSbase_uae_14_day_trial4.73434.51412020-03-05 14:52:222020-03-19 14:52:22False0.0000RAKBANK14True12020-03-19 14:52:222020-07-19 14:52:22NaNTrue
7824930568baseNaNcomedyaccess to exclusive content25.851492NaN73.0FalsefacebookfacebookiOSbase_uae_14_day_trial4.73434.51412020-02-23 17:50:252020-03-08 17:50:25False0.6606NaN14True12020-03-08 17:50:252020-07-08 17:50:25NaNTrue
8923452753economy3.0dramareplace OTT28.7514052.071.0FalsesearchfacebookAndroidbase_uae_14_day_trial4.73434.51412020-01-21 14:17:532020-02-04 14:17:53False0.3303NaN14False0NaNNaN2020-01-27 00:00:00False
91021191741NaNNaNNaNexpand regional access34.551230NaN53.0Trueorganicpublic_radioNaNbase_uae_14_day_trial4.73434.51412019-07-11 16:00:422019-07-25 16:00:42False0.0367Standard Charter14False12019-07-25 16:00:42NaN2019-08-20 00:00:00True

Last rows

df_indexsubidpackage_typenum_weekly_services_utilizedpreferred_genreintended_useweekly_consumption_hournum_ideal_streaming_servicesagemale_TFattribution_technicalattribution_surveyop_sysplan_typemonthly_pricediscount_priceaccount_creation_datetrial_end_dateinitial_credit_card_declinedjoin_feepayment_typenum_trial_dayscurrent_sub_TFpayment_periodlast_paymentnext_paymentcancel_datetrial_completed
19155322761522117405baseNaNcomedyaccess to exclusive content24.401536NaN30.0FalseyoutubefacebookNaNbase_uae_14_day_trial4.73434.51412020-02-03 13:56:332020-02-17 13:56:33False0.0367Paypal14True12020-02-17 13:56:332020-06-17 13:56:33NaNTrue
19155422761626828621base4.0dramaaccess to exclusive content27.3014482.044.0FalseemailfacebookiOSbase_uae_14_day_trial4.73434.51412020-02-26 02:12:052020-03-11 02:12:05False0.0000Standard Charter14True12020-03-11 02:12:052020-07-11 02:12:05NaNTrue
19155522761822218943economyNaNcomedyreplace OTT37.451142NaN67.0Truebrand sem intent bingreferraliOSbase_uae_14_day_trial4.73434.51412019-11-16 02:53:502019-11-30 02:53:50False0.0367NaN14False0NaNNaN2019-11-27 00:00:00False
19155622761925492551base3.0comedyaccess to exclusive content30.2013612.032.0FalseemailfacebookAndroidbase_uae_14_day_trial4.73434.51412019-09-30 22:07:372019-10-14 22:07:37False0.0000NaN14True22020-02-14 22:07:372020-06-14 22:07:37NaNTrue
19155722762125549852enhancedNaNcomedyaccess to exclusive content28.751405NaN61.0FalseaffiliatefacebookAndroidbase_uae_14_day_trial4.73434.51412020-03-06 02:57:032020-03-20 02:57:03False0.3303NaN14True12020-03-20 02:57:032020-07-20 02:57:03NaNTrue
19155822762225835684base2.0dramaaccess to exclusive content24.4015362.043.0FalseemailpinterestiOSbase_uae_14_day_trial4.73434.51412020-01-01 22:43:562020-01-15 22:43:56False0.0000NaN14True12020-01-15 22:43:562020-05-15 22:43:56NaNTrue
19155922762321434712enhanced3.0comedysupplement OTT28.7514052.038.0Falsefacebookfacebook_organiciOSbase_uae_14_day_trial4.73434.51412019-11-17 14:12:332019-12-01 14:12:33False0.3303NaN14True12019-12-01 14:12:332020-04-01 14:12:33NaNTrue
19156022762425843074enhanced2.0comedyreplace OTT27.3014482.049.0Falsegoogle_organicreferraliOSbase_uae_14_day_trial4.73434.51412019-12-06 18:02:132019-12-20 18:02:13False0.3303Paypal14True12019-12-20 18:02:132020-04-20 18:02:13NaNTrue
19156122762524799085baseNaNcomedyaccess to exclusive content31.651317NaN45.0FalsefacebookfacebookiOSbase_uae_14_day_trial4.73434.51412019-12-21 19:40:442020-01-04 19:40:44True0.0367NaN14True12020-01-04 19:40:442020-05-04 19:40:44NaNTrue
19156222762720166335baseNaNcomedyreplace OTT25.851492NaN55.0FalseorganictviOSbase_uae_14_day_trial4.73434.51412019-11-26 19:09:092019-12-10 19:09:09False0.0367NaN14False0NaNNaN2019-12-09 00:00:00False